Soochow University: Description and Analysis of the Chinese Word Sense Induction System for CLP2010

نویسندگان

  • Hua Xu
  • Bing Liu
  • Longhua Qian
  • Guodong Zhou
چکیده

Recent studies on word sense induction (WSI) mainly concentrate on European languages, Chinese word sense induction is becoming popular as it presents a new challenge to WSI. In this paper, we propose a feature-based approach using the spectral clustering algorithm to this problem. We also compare various clustering algorithms and similarity metrics. Experimental results show that our system achieves promising performance in F-score.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Overview of the Chinese Word Sense Induction Task at CLP2010

In this paper, we describe the Chinese word sense induction task at CLP2010. Seventeen teams participated in this task and nineteen system results were submitted. All participant systems are evaluated on a dataset containing 100 target words and 5000 instances using the standard cluster evaluation. We will describe the participating systems and the evaluation results, and then find the most sui...

متن کامل

ISCAS: A System for Chinese Word Sense Induction Based on K-means Algorithm

This paper presents an unsupervised method for automatic Chinese word sense induction. The algorithm is based on clustering the similar words according to the contexts in which they occur. First, the target word which needs to be disambiguated is represented as the vector of its contexts. Then, reconstruct the matrix constituted by the vectors of target words through singular value decompositio...

متن کامل

Improving Word Sense Induction by Exploiting Semantic Relevance

Word Sense Induction (WSI) is the task of automatically inducing the different senses of a target word from unannotated text. Traditional approaches based on the vector space model (VSM) represent each context of a target word as a vector of selected features (e.g. the words occurring in the context). These approaches assume that the words occurring in the context are independent and do not exp...

متن کامل

A Pipeline Approach to Chinese Personal Name Disambiguation

In this paper, we describe our system for Chinese personal name disambiguation task in the first CIPSSIGHAN joint conference on Chinese Language Processing(CLP2010). We use a pipeline approach, in which preprocessing, unrelated documents discarding, Chinese personal name extension and document clustering are performed separately. Chinese personal name extension is the most important part of the...

متن کامل

Word Sense Induction for Machine Translation

We have witnessed the research progress of machine translation from phrase/syntax-based to semanticsbased and from single sentence-based to discourse and document-based. This talk presents our work of word sense-based translation model for statistical machine translation, which is one of semantics-based SMT research at word sense level. The sense in which a word is used determines the translati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010